Validity and reliability
IN HIS NAME What is Validity and Reliability in Testing? In testing, validity is the ability of the test to measure which it is intended to measure. Validity is the extent to which a test measures what it is supposed to measure. We can divide the types of validity into logical and empirical. Content Validity: When we want to find out if the entire content of the behavior/construct/area is represented in the test we compare the test task with the content of the behavior. This is a logical method, not an empirical one. For Example, if we want to test knowledge on American Geography it is not fair to have most questions limited to the geography of New England. Face Validity: Basically face validity refers to the degree to which a test appears to measure what it purports to measure. Criterion-Oriented or Predictive Validity: When you are expecting a future performance based on the scores obtained currently by the measure, correlate the scores obtained with the performance. The later performance is called the criterion and the current score is the prediction. This is an empirical check on the value of the test – a criterion-oriented or predictive validation. Concurrent Validity: Concurrent validity is the degree to which the scores on a test are related to the scores on another, already established, test administered at the same time, or to some other valid criterion available at the same time. Example, a new simple test is to be used in place of an old cumbersome one, which is considered useful measurements are obtained on both at the same time. Logically, predictive and concurrent validation are the same, the term concurrent validation is used to indicate that no time elapsed between measures. Construct Validity: Construct validity is the degree to which a test measures an intended hypothetical construct. Many times psychologists assess/measure abstract attributes or constructs. The process of validating the interpretations about that construct as indicated by the test score is construct validation. This can be done experimentally, e.g., if we want to validate a measure of anxiety. We have a hypothesis that anxiety increases when subjects are under the threat of an electric shock, then the threat of an electric shock should increase anxiety scores. Reliability On the other hand, reliability is the index that indicated how accurately the examination measures a candidate's skills. This is a necessary condition to achieve examination validity. Research requires dependable measurement. Measurements are reliable to the extent that they are repeatable and that any random influence which tends to make measurements different from occasion to occasion or circumstance to circumstance is a source of measurement error. Reliability is the degree to which a test consistently measures whatever it measures. Errors of measurement that affect reliability are random errors and errors of measurement that affect validity are systematic or constant errors. Reliability refers to a measure’s ability to capture an individual’s true score, i.e. to distinguish accurately one person from another While a reliable measure will be consistent, consistency can actually be seen as a by-product of reliability, and in a case where we had perfect consistency (everyone scores the same and gets the same score repeatedly), reliability coefficients could not be calculated No variance/covariance to give a correlation The error in our analyses is due to individual differences but also the lack of the measure being perfectly reliable. Criteria of reliability Test-retest Test components (internal consistency) Test-retest reliability Consistency of measurement for individuals over time The score similarly e.g. today and 6 months from now Issues • Memory • If too close in time the correlation between scores is due to memory of item responses rather than true score captured • Chance covariation. Any two variables will always have a non-zero correlation Reliability is not constant across subsets of a population General IQ scores good reliability IQ scores for college students, less reliable Restriction of range, fewer individual differences Test-retest, equivalent forms Test-retest Reliability: Test-retest reliability is the degree to which scores are consistent over time. It indicates score variation that occurs from testing session to testing session as a result of errors of measurement. Problems: Memory, Maturation, Learning. Equivalent-Forms or Alternate-Forms Reliability: Two tests that are identical in every way except for the actual items included. Used when it is likely that test takers will recall responses made during the first session and when alternate forms are available. Correlate the two scores. The obtained coefficient is called the coefficient of stability or coefficient of equivalence. Problem: Difficulty of constructing two forms that are essentially equivalent. Both of the above require two administrations. Split-Half Reliability: It requires only one administration, especially appropriate when the test is very long. The most commonly used method to split the test into two is using the odd-even strategy. Since longer tests tend to be more reliable, and since split-half reliability represents the reliability of a test only half as long as the actual test, a correction formula must be applied to the coefficient. Split-half reliability is a form of internal consistency reliability. Rationale Equivalence Reliability: Rationale equivalence reliability is not established through correlation but rather estimates internal consistency by determining how all items on a test relate to all other items and to the total test. Internal Consistency Reliability: Determining how all items on the test relate to all other items. Kudser-Richardson-> is an estimate of reliability that is essentially equivalent to the average of the split-half reliabilities computed for all possible halves. Standard Error of Measurement: Reliability can also be expressed in terms of the standard error of measurement. It is an estimate of how often you can expect errors of a given size. Split-half reliability is all determined through correlation.